Machine learning analysis of thermophysical and thermohydraulic properties in ethylene glycol- and glycerol-based SiO2 nanofluids

The study investigates the heat transfer and friction factor properties of ethylene glycol and glycerol-based silicon dioxide nanofluids flowing in a circular tube under continuous heat flux circumstances. This study tackles the important requirement for effective thermal management in areas such as electronics cooling, the automobile industry, and renewable energy systems. Previous research has encountered difficulties in enhancing thermal performance while handling the increased friction factor associated with nanofluids. This study conducted experiments in the Reynolds number range of 1300 to 21,000 with particle volume concentrations of up to 1.0%. Nanofluids exhibited superior heat transfer coefficients and friction factor values than the base liquid values. The highest enhancement in heat transfer was 5.4% and 8.3% for glycerol and ethylene glycol -based silicon dioxide Nanofluid with a relative friction factor penalty of ∼30% and 75%, respectively. To model and predict the complicated, nonlinear experimental data, five machine learning approaches were used: linear regression, random forest, extreme gradient boosting, adaptive boosting, and decision tree. Among them, the decision tree-based model performed well with few errors, while the random forest and extreme gradient boosting models were also highly accurate. The findings indicate that these advanced machine learning models can accurately anticipate the thermal performance of nanofluids, providing a dependable tool for improving their use in a variety of thermal systems. This study's findings help to design more effective cooling solutions and improve the sustainability of energy systems.


Experimental apparatus and method NF preparation
SiO 2 NPs powder was purchased from US Research Nanomaterials Inc., USA.EG and G with 99% high purity were obtained from R&M Chemicals.The scanning electron microscopy (SEM) image of the SiO 2 was taken using a Field Emission Scanning Electron Microscope (Supra 55VP FE-SEM, Carl Zeiss).Figure 1a depicts the SEM images.As seen, the NPs are nearly spherical shaped.The particle sizes vary between 8 and 25 nm, with an average particle size estimated as 21 nm 29 .The EDX analysis of the selected area in the SEM image is displayed in Fig. 1b.The result indicates that the composition of SiO 2 is 100% consistent with vendor specifications.
NFs are prepared in the respective base liquids, G and EG, following a two-step method 14 .Each quantity of NPs required for 0.5%, 0.75%, 1.0%, and 2.0% volume fractions was taken on an electronic balance (TLE 104E, Mettler-Toledo).The formulation was achieved by dispersing SiO 2 in the base liquids with a magnetic agitator to stir the sample in a beaker for 30 min while adjusting the pH value.Following a similar protocol, another set of SiO 2 NFs in a 60:40 by volume G and EG was prepared for comparison.
It is reported that NPs would agglomerate to form clusters and settle over time due to high surface energies 30 .Kinetic energy is needed to break down the particle clusters into minute sizes, according to Darzi et al. 15 .Given that, the samples have been subjected to an ultrasonic homogenization (Labsonic M, Sartorius AG) operating at a 30 kHz frequency for 2 h to improve the stability of the dispersions.The pH control is crucial in colloidal stability as it determines the suspension's isoelectric point (IEP).Highly acidic NF (low pH) can lead to corrosion with the long-term flow in pipes.The pH value of the NF samples varied between 6 and 7.The pH value was set to 10 by adding NH 3 OH to have zeta potential values away from IEP 31 .No dispersant was added during the preparation process to avoid altering the properties of the NF.
The stability of NFs was checked using Zetasizer (Nano ZSP, Malvern), which operates on the principle of dynamic light scattering to measure charge repulsion/attraction between dispersed particles.At pH 10, the dispersions had an average absolute zeta potential of − 33 and − 42 mV indicating that the SiO 2 NFs are stable.Further, a small portion of the NFs is kept under static conditions for months and examined.G-based NFs were stable for over three months, whereas EG-based NFs were stable for up to 1 month without settling in clear storage containers.www.nature.com/scientificreports/

Thermophysical property measurement
The thermal property analyzer (KD2 Pro Decagon Devices) based on the transient hot-wire method was used to evaluate the effective TC of NFs within a specified precision of 5.0%.The KS-1 probe with a 60 mm length and 1.28 mm diameter was selected, which provides a transient line heat source.The sample temperature was controlled using an isothermal bath (Vivo-RT2, Julabo), with temperature stabilization better than ± 0.1 K.A rotational rheometer (MCR 302, Anton Paar) was used for effective viscosity measurement.A double-gap concentric cylinder was employed as the measuring geometry.A gap distance of 1.0 mm was allowed between the co-axial cylinders of the system.A Peltier thermostat controlled the cell temperature with a precision of ± 0.1 K. Repeated tests were conducted LVDV-III Ultra Programmable Rheometer.A digital densitometer (DA-645, KEM) which functions on an oscillating U-tube principle, was instrumental in determining the effective density with ± 0.00005 g/cm 3 accuracy.Peltier thermoelectric elements enable temperature control within the measuring cell, assuring a precision below 0.03 °C.A differential scanning calorimeter (DSC Q2000, TA Instruments) analyzed the specific heat with an accuracy of 2%.Precise heat measurement was made using the standard test method (ASTM-E1269) under a high-purity nitrogen atmosphere at a 20 °C/min heating rate in the DSC furnace.The device temperature is ± 0.01 °C.A refrigeration cooling system RCS90 was used to conduct specific heat testing at different temperatures.
All devices have been calibrated with either G/EG before the measurements with NFs.The data are collected in the range of temperatures 20-80 °C at atmospheric pressure.Three readings were obtained for each sample at each temperature, and the mean value was stated.
Table 1 shows the measured properties of G and EG.Comparisons have been made with the values reported in the literature [32][33][34][35] .The Hewitt 33 data correlates well with thermal conductivity values within a maximum of 0.25 and 0.02% deviations for glycerol and ethylene glycol, respectively.Hewitt and Cabaleiro's specific heat data showed a 0.9 and 5.6% variation, while the Lide thermal conductivity data deviated from the measured values by 1.3 and 0.6%.A maximum deviation in viscosity of 5.9 and 1.1% for G and EG was observed when compared with Lide 35 and Quijada-Maldonado 32 , respectively, in the temperature range of 25 to 80 °C.The overall deviation of the calibration results in all experiments was better than 6% from the reference values.Table 2 presents the uncertainty of the measured thermophysical properties of glycerol (G) and ethylene glycol (EG) in percentage.
A comparison of the thermophysical properties of SiO 2 NFs with the base liquid is shown in Fig. 2a-d.It was established that the TC enhancement was highest at 1.0% concentration with values of 4.2% and 10.7%, respectively, for G and EG-based NFs.The NF exhibited Newtonian behavior with viscosity independent of the shear rate in a similar manner to that evidenced by Tadjorodi et al. 36 and Żyła and Jacek 37 .The viscosity of SiO 2 NFs increased by 27% and 33% compared to base liquids G and EG.The suspension density increased by nearly 2% approximately.SiO 2 NFs exhibited lower values of effective specific heat than their base fluids.Meanwhile, the specific heat decreased by nearly 2.7% and 1.5%, over a temperature range of 25 to 80 °C.Further, the measured density values of NFs are consistent with the calculated values using the mixing theory relation within 1.0% deviation in each case.The specific heat data deviated by 3% from the classical thermal equilibrium model.16) as a function of particle volume concertation (φ) and temperature (T) for further analysis of heat transfer as follows:

Heat transfer test setup
The experimental system has a closed-loop design composed of five essential components: the test tube, power supply, cooling arrangement, measurement system, and data acquisition.The experimental setup schematic diagram is depicted in Fig. 3.The test section involves a single stainless-steel tube with a bell-mouth entry, thermocouples, and split heaters.The test section consists of a tube 1.0 m long with outer and inner diameters of 6.35 and 4.57 mm, respectively.The outer tube surface is embedded with two-cylinder split-body electric heaters.Each heater, with a 750 W maximum power rating, is wrapped with ceramic fiber insulation and connected to a variable transformer.Eight K-type thermocouples have been used to quantify and record the temperatures at different places.Two of the thermocouples are spot welded to the test section, each at an axial distance of 150 mm from either end of the tube, to measure the wall temperature; four thermocouples at equidistant positions of 100 mm from the tube end are located to measure the surface temperatures and two thermocouples at the inlet and outlet of the test section.All thermocouples were calibrated within a normal of ± 0.1 °C. ( The cooling unit comprises a chiller, circulating pump, water tank, and temperature control system.The chiller, rated at 0.74 kW, was connected to the plate-type heat exchanger to regulate the fluid temperature at the test section inlet.A 3.0 hp horizontal multistage pump (AB, Teral) circulated the fluid in the test section.A digital vortex flow meter (SV4200, IMF) with a range of 1-20 LPM is utilized to quantify the fluid flow.A volumetric cylinder made of acrylic of 1.0 L capacity with scale graduation was connected at the test-section exit as a working fluid reservoir to check the flow rate of the fluid visually.Two absolute pressure transducers (GP-100, Keyence) were installed at the pipe inlet and outlet to measure the pressure drop (∆P) across the test section.The calibration range of the sensors from 0 to 10 MPa is ± 1%.All measuring instruments in the circuit were connected to the data logger for recording output signals.
The experiments were undertaken between 6 and 12 LPM flow rates corresponding to flow Reynold numbers of 1200-22,000, while the working fluid temperature is maintained at 80 °C.All the readings are recorded at steady-state conditions.The circuit is cleaned with water and air-dried between successive experiments.

Data analysis
The heat energy Q i provided to the working fluid in the heat section is a function of electric current I and voltage: Simultaneously, the rate of heat transfer Q a was evaluated from the mass flow rate ṁ and the fluid temperatures at the inlet and outlet of the tube: Under a steady state, the energy available with the hot fluid exiting the test section should equal the heat removed by the cooling liquid in the chiller.Newton's law of cooling relation is used to evaluate the convective HTC as follows: where the A, T w and T b are surface area, wall temperature and the fluid bulk or average temperature, respectively computed as: The average Nusselt number was estimated based on the convective HTC h , tube diameter D , and TC of the fluid k as Furthermore, the turbulent HTC can be considered from Dittus-Boelter equation in the following form: where c, m, and n represent the coefficients suitable for NF experimental data.The Re , and Prandtl number (Pr) terms are defined as follows: www.nature.com/scientificreports/ The NF properties used for heat transfer analysis are determined at the bulk temperature.Further details on the derived equations for the thermophysical properties of the NF are given in Ref. 29 .The nondimensional FF f was calculated from Darcy-Weisbach equation, which relates the ∆P, pipe length L , hydraulic diameter D , fluid density ρ , and average velocity u as follows:

Uncertainty analysis
Analysis of the experimental uncertainty was undertaken to validate the precision of the measurements.The uncertainties in heat transfer characteristics were estimated based on the error approach presented by Beckwith et al. 39 , following the protocol described in 17,40 .The instrument and uncertainties estimated from the measured parameters are presented in Table 2.

Theoretical correlations for Heat transfer
Model 1 A correlation for Nusselt number of single-phase liquids under fully developed transition and turbulent flows is given by Gnielinski 41 as: where the fanning friction factor is given as f F = (1.58lnRe− 3.28) −2 .Equation ( 25) is valid in the2300 ≥ Re ≥ 10 6 range.

Model 2
Other correlations are considered for a developing flow in a circular tube with a small velocity boundary layer thickness.Del Giudice 42 developed a model for the developing flow heat transfer in a pipe exposed to uniform wall heat flux with the consideration of temperature dependence of viscosity and thermal conductivity as: where, X * = L D h RePr n = 0.761(RePr) 0.0224 − 0.000109RePr Where Pn µ is viscosity Pearson number = (β q " w D/k e ); β = − (dµ/dt)/µ; q " w is the heat flux at the tube surface W/m 2 ; D is the tube inner diameter; k e is the TC at tube entry temperature.Equation (26a) is valid for 5.0 ≤ Pr ≤ 100; 10 -4 ≤ X * ≤ X * max .The value of X * max 1 is estimated for the experimental conditions to be 0.08.

Model 3
Muzychka and Yovanovich 43 presented a model for predicting Nu in the combined entrance region of a tube valid for uniform wall flux boundary conditions given by where, For a circular tube, ǫ = 1, C 1 = 3.86, C 2 = 1.5, C 3 = 0.501, and C 4 = 2 Equation (26b) is valid for 0 < Z * < ∞ and 0.01 < Pr < ∞.

Machine learning
The experimental data collected in the last section was employed to develop a comprehensive set of models to prognosticate the thermohydraulic behavior of Ethylene Glycol and Glycerol based non-porous SiO 2 nanofluids.A battery of Python-based open-source libraries was used in the Jupytor environment.A total of five ML techniques were employed for the development of prediction-models in this case.The LR, RF, DT, XGBoost, and AdaBoost models were chosen for their various strengths: LR for simplicity and interpretability, RF, and XGBoost for excellent prediction accuracy, DT for simple decision-making, and AdaBoost for improving weak learners. ( Vol.:(0123456789) www.nature.com/scientificreports/ The LR was used to prepare the baseline model while the other four namely Random Forest (RF), Extreme gradient boosting (XGBoost), Adaptive boosting (AdaBoost), and Decision tree (DT) were compared against it.A brief description for each is provided as follows: Linear regression Linear regression (LR) is considered most basic form of supervised ML algorithm.In this case, a linear equation is fitted to the actual data for modeling the correlation with independent variables (features) and a dependent variable (target).The objective in this case is to represent their mutual relationship.It can be expressed as follows: Herein, y is the target (dependent variable), x 1 , x 2 , x 3 . . . .are features (independent variable), β 0 , β 1 , β 2 , . . . .are coefficients and ε denotes the error.
LR algorithm is employed to locate the line which may provide the best fit to that minimize the sum of squared differences between the actual and predicted values.

Random Forest
Random Forest (RF) is a type of ensemble learning system.It is developed for regression type complex problems.In the training phase, RF generates a large number of decision trees, each of these is trained on a random portion of the training data and characteristics.
Let we denote, a training data set denoted as 'X' having n samples and m features.In this case y is the target variable, T denotes total count of decision trees in the test forest, X i denotes a random subset taken out from training data X, sampled with replacement in case when i ranges from 1 to T. Similarly in case of features F j denotes the random subset wherein, the j ranges from 1 to T.
In case of each decision tree i, a sample for training denotes as (X i , y i ) is randomly selected from (X,y).Then a decision tree Di is trained on (X i , y i ) employing a split criterion on the basis of least MSE.
The final prediction using a RF-based model is expressed as: In this case y is the forecasted output D i (x) denotes the forecast from i-th decision tree for input x.
To summarize, RF regression employs the aggregate of several DTs trained on random subsets of data to generate robust and precise forecasts for regression problems.

Decision Tree
Decision Tree (DT) is a fundamental and flexible ML method which can be used for regression of data.DT creates a hierarchical tree framework having core nodes denoting feature-based decisions while leaf nodes represent the predicted values.
Let us denote a training data set as 'X' with n samples and m features.In this scenario, y is the goal variable, D is the DT model.
In the training phase, the DT splits the feature space recursively into subgroups on the basis of feature values.At each node, the DT selects the feature and split threshold to keep the MSE as low as possible.In the regression work, the prediction is done by traversing the tree from root to leaf node and assigning the mean value of the target variable inside the leaf node to the input sample.
Mathematically, the DT-based forecast can be expressed in simple terms as: It can be summarized that DT-based regression separates the feature space recursively and then makes forecast using the mean target variable value inside each zone, resulting in interpretable and simple regression models.

Extreme gradient boosting
Extreme gradient boosting (XGBoost) belongs to the gradient boosting family of ML.XGBoost is known for its exception prediction performance.In the gradient boosting process, it sequentially combines weak learners so as to form a robust prognostic model.XGBoost make use gradient descent approach for minimizing the loss function 'L' such that In this case, the θ represents the parameters of model, whereas y i and y i denotes the actual and predicted val- ues.K denotes the number of weak learners in form of trees, and �(f k ) denotes the regularization term applied to each tree.
XGBoost uses the additive approach for model building as: www.nature.com/scientificreports/Herein, f k (x i ) represents the prediction of the k-th tree in case of i-th sample.Regularization techniques like L1 and L2 are used for controlling the complexity of the individual trees.
Herein, T is number of leaves in tree, ω denotes the leaf weights, and γ and are regularization parameters.
XGBoost also provides insights into feature importance by calculating the gain, which assesses the importance of each feature to the model's performance.
To summarize, XGBoost improves the model's accuracy by successively minimizing a loss function with gradient descent and incorporating regularization approaches to control model complexity.Its ability to offer feature importance and handle missing values render it an effective tool for regression problems across multiple domains.

Adaptive boosting
Adaptive boosting (AdaBoost) combines multiple weak leaners h(x) to form a strong predictor f(x).In mathematical terms the final predictor F(x) is a weighted sum of learners of weaker learners: Herein, T is the total number of weal learners, α 1 is the weight allotted to the t-th weak learner, and h 1 (x) is the prediction of t-th weak learner.
In the training phase, AdaBoost allots weight to each instance of training (x i , y i ) , in case when x i denotes the input while y i denotes true label.In starting phase all weights are set equally as: where N is the total count of instances.AdaBoost fits a weak learner to the training data w i in each iteration t, it subsequently computes the weighted error ε t of each weak learner.
Herein, 1( • ) is the indicator function.The weight α t of the t-th weak learner can be estimated as: Subsequently, the AdaBoost updates of training instances on the basis of misclassification error: In the end, the weight of weak learners is merged to develop the final predictor, F(x).This process continues until a predetermined number of iterations are completed or the errors has been appropriately reduced 44 .

Experimental test validations
The experimental setup was validated by a comparison of data undertaken with water.The experimental Nusselt number values for the flow of water, as presented in Fig. 4, were compared with the correlation of Gnielinski 41 .The data correlated well with the predicted values within ± 7.4%.Further validation was carried out by contrast of the experimental FF with Eq. ( 27) for turbulent flow in rough pipes 45 .An excellent concordance of the experimental data was observed.Following the validation for water, heat transfer experiments progressed with the 30GW base liquid and NF concentrations of 0.25, 0.5, 0.75, and 1.0% for flow range 6 to 12 LPM.

HTC and FF
The variation of HTC with flow rate for different concentrations of SiO 2 NFs is presented in Fig. 5.The base and NF HTC enhanced with flow rates.The increase in heat transfer reached a maximum for the three NFs at a 1.0% volume.Lower viscous SiO 2 -EG NFs exhibited significant HTC enhancement over the base liquid, compared to higher viscous SiO 2 -G NFs for similar conditions.The increase in HTC of SiO 2 -EG and SiO 2 -G NFs was determined as 5.9% and 1.9%, respectively, for a 1.0% volume fraction at a 12 LPM flow rate.This behavior could be explained by the flattening of the velocity profile and delay in boundary layer development in the fully developed region, among others 46,47 .The NF heat transfer augmentation can also be attributed to the effective TC increases caused by reduced viscosity near the wall, amplifying NPs' surface area, and particle reconfiguration 48,49 . (31) Gain = Gain in splitting creteria Number of times feature is used for splitting the data Figure 6 displays the variation of Nu with Re for various SiO 2 volume fractions.The figure shows a similar value increase over the base liquid for all nanofluids.The evolution rate is more significant for SiO 2 -EG in the turbulent flow than SiO 2 -G NFs at higher Re in the laminar flow range.Increasing the concentration enhances the Nu, possibly due to particle migration, TC enhancement, and lessening of boundary layer thickness.The enhancement in Nu at Re of 19,000 and 2,300 with 1.0% volume of the NFs are 1.4 and 1.1%, respectively.The findings are consistent with the results of SiO 2 /water in the laminar range, where the heat transfer improvement is relatively minuscule with growth in Re 38,50 .
Figure 7 depicts a comparison of base liquid Nu with single-phase theory.As observed, the correlation closely predicts the Nu of the base liquids.The average absolute values of the deviations between Del Giudice et al. 42 and Muzychka and Yovanovich 43 from the experimental data are 1.6 and 3.8% for SiO 2 -EG, respectively, while the deviations are 0.9 and 6.8% for SiO 2 -G NFs.The variation of the observed values from those estimated with Muzychka and Yovanovich's 43 correlation increased with the Re.At a Re of 22,000, for instance, a maximum absolute deviation of 1.84% was determined.As the results show, the correlations for single-phase flow can be used to forecast the base liquid HTC with the slightest deviation.Similar experimental evidence can be found in the work of Hwang et al. 46 .The FF variation with the Re is illustrated in Fig. 8.The FF decreases marginally with concentration and significantly with the Re.The FF of 20.6% decrement with SiO 2 -EG and 4.6% increase with SiO 2 -G NFs with 1.0% NF compared to base liquid at 13,000 and 2000 Re, respectively.The decrease in SiO 2 -EG nanofluid FF might be due to the turbulent nature of flow as compared to SiO 2 -G at 1.0% concentration.At 1.0% concentration, the viscosity of SiO 2 -G is approximately 10 times greater than SiO 2 -EG, and flowing in the laminar range of Re might be the reason for enhancement in FF.     www.nature.com/scientificreports/Further, the glycerol experiences greater friction within its adjacent fluid layers and exerts excellent flow resistance than ethylene glycol on external energy exposure.Also, the glycerol molecules' chemical (hydrogen) bonding is significantly robust, which means more external energy would be needed to break the intermolecular attraction forces and cause the liquid particles to move.A theoretical analysis was undertaken following the technique explained by Sharma et al. 30 to understand the flow characteristics in detail.
From the Figs. 9 and 10, the surface temperature decreases with increasing flow rate and Re. Figure 9 shows the surface temperature variation with flow rate for SiO 2 -G and SiO 2 -EG NFs.The wall temperature of SiO 2 -G is comparatively lower than SiO 2 -EG NFs for specific concentrations.The NF heat capacity of SiO 2 -G is approximately 17% greater than SiO 2 -EG, which might be the cause for the lower wall temperatures observed.Also, the flow velocities with G are lower than with EG, which might be another reason for lower wall temperatures with SiO 2 -G.The surface temperature does not vary significantly for SiO 2 -G compared to SiO 2 -EG NF with Re in Fig. 10.
The effect of concentration has been investigated for the evolution of nondimensional flow velocity with dimensionless length, as shown in Fig. 11, established in the earlier work by Sharma et al. 51 .The velocity profile of SiO 2 -G is relatively flatter than SiO 2 -EG NFs.Albeit, the flattening of dimensionless velocity was more pronounced with SiO 2 -G NF owing to the motion of the NPs, as compared to the base liquid.NPs can move either towards the tube wall or the axis region, depending on the magnitude of the density ratio of the NPs to the base liquid.The velocity profile flattens as the NPs move more rapidly than the fluid; the NPs migrate toward the tube wall.When the fluid moves quickly, the particles drift toward the axis of the tube 46,51 .Figure 12 display the predicted dimensionless temperature distribution as a function of dimensionless distance.As can be seen, the temperature decreases with concentration.The NF temperature profile of SiO 2 -G NF shows a higher temperature than the base liquid profile.The NF is associated with a decreasing temperature gradient in the flow vicinity away from the tube surface.One may compare the temperature gradients of SiO 2 -EG and SiO 2 -G nanofluids as illustrated in Fig. 13.Results show a logarithmic growth of the temperature gradient as the Re increase for SiO 2 -G NFs.At the same time, the SiO 2 -EG NFs display an inverse trend.More striking was the increasing temperature gradients for SiO 2 -G NF.The Nu does not vary significantly for the G and EG-based NFs.These findings concord with the earlier observation of higher HTCs with low-viscosity NFs 10 .

Machine learning-based model prediction
The experimental data and results collected in the experimental analysis part were used to create predictive models for friction factor and Nu number.The dataset typically contains information about Reynold's numbers and Prandtl numbers estimated from the test setting and results.This data serves as the foundation for training and testing the predictive models.Often, it is necessary to preprocess raw experimental data to eliminate missing numbers, outliers, and other issues 52,53 .For preparing the dataset for model training, Python tools such as Pandas are utilized to clean and organize it.Understanding the structure of the dataset is crucial in order to gain valuable insights.With Python libraries such as Matplotlib and Seaborn, visualizations become effortless.This allows one to explore the connection between various factors, identify trends, and gain insights into the underlying patterns within the data.The descriptive statistical as listed in Table 3, offer insights into the characteristics of the dataset.For the variable Re number, it was noted that a mean value of 8154.069 and a standard deviation of 6569.183.The IQR spans from the 25th percentile at 2209.716 to the 75th percentile at 14,498.13.In the case of Re a negative kurtosis was observed, indicating a slightly flatter shape compared to a normal distribution.In the case of Pr number, a mean value of 117.947 and a standard deviation of 94.784 were observed.The IQR spans from 26.614 to 246.968, with a median value (50th percentile) of 80.166.The kurtosis for Pr is negative, indicating a distribution that is slightly flatter than the normal distribution 54,55 .
On the other hand, in the case of the Nu number, the mean was estimated at 28.115, and the standard deviation of 2.288.The kurtosis value in the case of Nu was also negative, suggesting a distribution that is slightly flatter than normal.Also, in the case of friction factor, a mean value of 0.176 was observed a standard deviation of 0.052.The IQR spans from 0.135 to 0.208, with a median value of 0.168.However, in the case of Ff is positive, demonstrating a slightly peaked distribution compared to the normal distribution.
Overall, these descriptive statistics provide a comprehensive overview of the dataset, including measures of central tendency, dispersion, and shape of the distributions for each variable.www.nature.com/scientificreports/Nusselt number model A predictive model for the Nu number was built following the completion of the data analysis, which included the use of a correlation heatmap and descriptive statistical analysis.A random split of the data was performed at a ratio of 70:30 for the purpose of training and testing the model.The five ML approaches LR, RF, XGBoost, AdaBoost, and DT were employed for the development of prediction models.Following the completion of the models, they were utilized for the purpose of prediction.At the end of the Nu number models, the comparative findings depicting actual vs predicted Nu results are displayed in Fig. 15a-e.Figure 15a demonstrates the contrast between the actual values and the values predicted for LR-based model, and Fig. 15b for DT, Fig. 15c for RF, Fig. 15d for XGBoost, and the Fig. 15e for AdaBoost.It can be observed that except LR all other models performed in a satisfactory manner; however, the XGBoost-based model was more superior than the other models [56][57][58] .
The statistical evaluation of the Nu models developed with various methods was conducted and the results are listed in Table 4.In case of LR-based model a training MSE of 1.9 and a test phase MSE of 2.26 were observed.This indicates a poor level of performance indicating that the error caused by LR is substantially larger in comparison to that of other models.Given that the R 2 values for LR are 0.651 for training and 0.35 for testing, it may be inferred that LR only explains a moderate amount of the variance in Nu.In the case of DT-based model, it showed a flawless performance, demonstrating MSE as 0 and R 2 as 1.These values indicates that there were almost nil prediction errors.The DT-based Nu model demonstrated stellar performance during model testing also.It displayed a Test MSE of 0.095 and a Test R 2 of 0.972 59,60 .
The performance of RF-based models was good as it had a train MSE of 0.0108 and a Test MSE of 0.069, indicating only a few errors throughout both the training and testing phases of the model development process.The R 2 values in the case of RF were fairly high, at 0.998 during training and 0.98 for testing.This shows that RF may capture a significant portion of the volatility in Nu.Furthermore, XGBoost performs very well, as indicated by its Train MSE of 0.00001 and Test MSE of 0.045, both of which imply a low number of errors.With a training value of 0.9999 and a testing value of 0.9871, the XGBoost-based model's R 2 values are very high, suggesting that the model has an excellent link to the data.Given that AdaBoost has a Train MSE of 0.276 and a Test MSE of 0.4451, indicating that it has few mistakes, its performance is fairly excellent.AdaBoost's R 2 values of 0.9496 for training and 0.8725 for testing indicate that the model fits the training data well, while the test data shows a minor reduction in performance [61][62][63] .
It can be observed that both RF and XGBoost stand out as the best models for predicting the Nu model, on the basis of statistical evaluations.This is owing to the fact that they produce low error metrics and high R 2 values for both the training and test datasets.Because these models can correctly capture the complexities of the data and generate accurate predictions for the Nu model, they are suitable for regression applications.
The models were further tested using visual description by employing Taylor's diagram and violin plots to compare their performance.Figure 16 depicts Taylor's diagram while the violin plots for all models are depicted in Fig. 17.In the case of Nu model prediction during training, it can be observed that both DT and XGBoost-model performed superior to other models but the XGBoost-based model was best.Similarly, in the case of model testing, the XGBoost was the best model among the five-model tested in this case.The improved performance of RF and XGBoost models is primarily attributable to their robustness in dealing with complex, nonlinear interactions, as well as their ability to prevent overfitting using ensemble approaches.The violin plots were drawn for each of the models as depicted in Fig. 17a for the training phase while Fig. 17b shows violin plots for the testing phase.Hereto, it could be observed that the XGboost-based model was superior to other models as can be observed by the shape of violin plots as well as median lines on the plots.

Friction factor model
In the case of friction factor also, the five machine learning algorithms (LR, RF, XGBoost, AdaBoost, and DT) were employed for the creation of prediction models.After the models were completed, they were used to make predictions.Figure 18a-e show the comparison findings from the Nu number models, illustrating real vs expected Nu outcomes.Figure 18a shows the difference between the actual and predicted values for the LR-based model,  Statistical studies show that both RF and XGBoost are the best models for predicting the Nu model.This is due to the fact that they provide low error metrics and strong R 2 values across both the training and test datasets.These models are appropriate for regression applications because they can accurately capture data complexity and give reliable predictions for the Nu model.
The models were further examined visually using Taylor's diagram and violin plots to compare their performance.

Conclusions
The work presents a convective heat transfer coefficient and friction factor assessment of ethylene glycol and glycerol-based non-porous silicon dioxide nanofluid flow in a tube under constant heat flux boundary conditions.The experiments were conducted for Reynolds numbers between 1300 and 21,000 and concentrations ranging from 0 to 1.0% volume at approximately 80 °C.The heat transfer coefficient and friction factor data were analyzed based on experimental thermophysical properties correlations.To model and predict the complex and nonlinear

Figure 1 .
Figure 1.(a) SEM image of SiO 2 nanoparticle and (b) EDX analysis of the selected area in the SEM image.

Figure 4 .
Figure 4. Comparison of Nu between present work and model predictions for flow of distilled water in a tube.

Figure 5 .
Figure 5. Variation of HTC with flow rate for the three NFs.

Figure 6 .
Figure 6.Comparison of Nu for the three NFs with Re.

Figure 7 .
Figure 7.Comparison of Nu for the three base liquids with theory.

Figure 8 .
Figure 8.Comparison of FF for the three NFs with Re.

Figure 9 .
Figure 9. Variation of tube surface temperature with flow rate for the three NFs.

Figure 10 .
Figure 10.Comparison of tube surface temperature for base liquid and NFs at 1.0% concentration.

Figure 11 .
Figure 11.Variation of dimensionless velocity with radial distance for the three NFs.

Figure 12 .
Figure 12.Variation of dimensionless temperature with radial distance for the three NFs.

Figure 13 .
Figure 13.Comparison of temperature gradient with flow Re for the three NFs.
www.nature.com/scientificreports/DT-based model performed flawlessly, with an MSE of zero and an R 2 of one.These figures suggest that there were virtually no forecast mistakes.The DT-based Nu model performed admirably throughout model testing as well.It had a Test MSE of 0.00014 and an R 2 of 0.94.RF-based models performed well, with a train MSE of 0.00001 and a test MSE of 0.00007, indicating only a few errors during the model generation process.The R 2 values for RF were rather high, at 0.994 during training and 0.97 during testing.This suggests that RF may capture a considerable percentage of the volatility in Nu.Furthermore, XGBoost works admirably, as seen by its Train MSE of 0.000002 and Test MSE of 0.0001, both of which suggest a smaller error.The R 2 values for the XGBoost-based model are extremely high, with a training value of 0.999 and a testing value of 0.958, indicating that the model has a strong relationship to the data.Given that AdaBoost has a Train MSE of 0.00026 and a Test MSE of 0.00036, showing that it makes few errors, its performance is rather good.AdaBoost's R 2 values of 0.906 for training and 0.852 for testing suggest that the model fits the training data well, while the test data reveals a modest loss in performance.
Figure 19 illustrates Taylor's diagram, and Fig. 20 depicts the violin plots for all models.In terms of friction factor model prediction during training, both the RF and the XGBoost models outperformed the other models, but the XGBoost-based model was the best.Similarly, in terms of model testing, the XGBoost outperformed the other five models evaluated.Violin plots were constructed for each model as shown in Fig. 20a for the training phase, and Fig. 20b for the testing phase.Previously, it was noted that the XGBoost-based model outperformed other models, as seen by the shape of the violin plots and the median lines on the plots.

Figure 16 .
Figure 16.Taylor's diagram for Nu model during (a) training and (b) testing phase.

Figure 17 .
Figure 17.Violin plots for Nu model for (a) training and (b) testing phase.

Figure 20 .
Figure 20.Violin plots for friction factor model for (a) training and (b) testing phase.

Table 4 .
Statistical evaluation results of the Nu model.